-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transaction on_commit before signals for alert group actions #3731
Conversation
this may also close #3729 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, I see that it worked in the past, but do we have a transaction blocks in these places?
@@ -686,11 +687,7 @@ def acknowledge_by_source(self): | |||
f"log record {log_record.pk} with type '{log_record.get_type_display()}', action source: alert" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging is incorrect
log_record=log_record.pk, | ||
action_source=action_source, | ||
) | ||
transaction.on_commit(partial(send_alert_group_signal.delay, log_record.pk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it called inside transaction block
alert_group_action_triggered_signal.send( | ||
sender=self.acknowledge_by_user, | ||
log_record=log_record.pk, | ||
action_source=action_source, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure, are you dropping action_source
on purpose here (and below)?
Also checking, we are ok making this async now, right? (I guess so, and it makes sense; will take a look)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double-check this, I think only slack uses action_source from the signal but force_sync was used instead (for the delete case). In all cases it should be able to be taken from the log record instead of it being passed on the signal, I wanted to avoid changing the signature on send_alert_group_signal/having to add another task with a different signature.
engine/apps/telegram/tasks.py
Outdated
logger.warning( | ||
f"AlertGroupTelegramRepresentative: log record {log_record_id} never created or has been deleted" | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
log_record=log_record_pk, | ||
action_source=None, | ||
force_sync=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we trigger the action triggered signal and queue the cleanup task after that. Makes sense 👍
@@ -10,7 +11,7 @@ | |||
@shared_dedicated_queue_retry_task( | |||
autoretry_for=(Exception,), retry_backoff=True, max_retries=1 if settings.DEBUG else None | |||
) | |||
def delete_alert_group(alert_group_pk, user_pk): | |||
def delete_alert_group(alert_group_pk: int, user_pk: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not for this PR, but wondering if this still needs to be a task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it for now, I guess something could go wrong in the stop escalation part of delete and we could retry here.
We do not, going to test this on dev to see if it improves the issue. It comes down to things should have autocommit but we keep finding cases where the objects are missing. |
Added back raising exceptions. In most cases they will not retry and we will get a log message and task failure. In the delete cases only a log message is written, no reason to raise an exception if it was already deleted. There are a couple cases for slack and telegram where it is inside of a signal call, here will still retry to not change the behavior at the top level. If we want to not retry the signal cases we will need to add an intermediate task to isolate the one handler from others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# What this PR does Add transactions around log record creation and check transaction on_commit before sending signals passing DB id of alert group log records. In cases for delete we can then assume any missing IDs on tasks are from intentionally deleted alert groups and we can stop tasks from retrying endlessly. ## Which issue(s) this PR fixes ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
What this PR does
Add transactions around log record creation and check transaction on_commit before sending signals passing DB id of alert group log records. In cases for delete we can then assume any missing IDs on tasks are from intentionally deleted alert groups and we can stop tasks from retrying endlessly.
Which issue(s) this PR fixes
Checklist
pr:no public docs
PR label added if not required)CHANGELOG.md
updated (orpr:no changelog
PR label added if not required)